stochastic optimal control
Stochastic Optimal Control Matching
Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest.
Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths
We consider the problem of sampling transition paths between two given metastable states of a molecular system, eg. a folded and unfolded protein or products and reactants of a chemical reaction. Due to the existence of high energy barriers separating the states, these transition paths are unlikely to be sampled with standard Molecular Dynamics (MD) simulation. Traditional methods to augment MD with a bias potential to increase the probability of the transition rely on a dimensionality reduction step based on Collective Variables (CVs). Unfortunately, selecting appropriate CVs requires chemical intuition and traditional methods are therefore not always applicable to larger systems. Additionally, when incorrect CVs are used, the bias potential might not be minimal and bias the system along dimensions irrelevant to the transition. Showing a formal relation between the problem of sampling molecular transition paths, the Schrodinger bridge problem and stochastic optimal control with neural network policies, we propose a machine learning method for sampling said transitions. Unlike previous non-machine learning approaches our method, named PIPS, does not depend on CVs. We show that our method successful generates low energy transitions for Alanine Dipeptide as well as the larger Polyproline and Chignolin proteins.
Stochastic Optimal Control for Diffusion Bridges in Function Spaces
Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: 1) learning bridges between two infinite-dimensional distributions and 2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions.
Iterative Tilting for Diffusion Fine-Tuning
Pachebat, Jean, Conforti, Giovanni, Durmus, Alain, Janati, Yazid
We introduce iterative tilting, a gradient-free method for fine-tuning diffusion models toward reward-tilted distributions. The method decomposes a large reward tilt $\exp(λr)$ into $N$ sequential smaller tilts, each admitting a tractable score update via first-order Taylor expansion. This requires only forward evaluations of the reward function and avoids backpropagating through sampling chains. We validate on a two-dimensional Gaussian mixture with linear reward, where the exact tilted distribution is available in closed form.
A Unified and Fast-Sampling Diffusion Bridge Framework via Stochastic Optimal Control
Pan, Mokai, Zhu, Kaizhen, Ma, Yuexin, Fu, Yanwei, Yu, Jingyi, Wang, Jingya, Shi, Ye
Recent advances in diffusion bridge models leverage Doob's $h$-transform to establish fixed endpoints between distributions, demonstrating promising results in image translation and restoration tasks. However, these approaches often produce blurred or excessively smoothed image details and lack a comprehensive theoretical foundation to explain these shortcomings. To address these limitations, we propose UniDB, a unified and fast-sampling framework for diffusion bridges based on Stochastic Optimal Control (SOC). We reformulate the problem through an SOC-based optimization, proving that existing diffusion bridges employing Doob's $h$-transform constitute a special case, emerging when the terminal penalty coefficient in the SOC cost function tends to infinity. By incorporating a tunable terminal penalty coefficient, UniDB achieves an optimal balance between control costs and terminal penalties, substantially improving detail preservation and output quality. To avoid computationally expensive costs of iterative Euler sampling methods in UniDB, we design a training-free accelerated algorithm by deriving exact closed-form solutions for UniDB's reverse-time SDE. It is further complemented by replacing conventional noise prediction with a more stable data prediction model, along with an SDE-Corrector mechanism that maintains perceptual quality for low-step regimes, effectively reducing error accumulation. Extensive experiments across diverse image restoration tasks validate the superiority and adaptability of the proposed framework, bridging the gap between theoretical generality and practical efficiency. Our code is available online https://github.com/2769433owo/UniDB-plusplus.
Stochastic Optimal Control Matching
Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models.
Stochastic Optimal Control via Measure Relaxations
Buehrle, Etienne, Stiller, Christoph
The optimal control problem of stochastic systems is commonly solved via robust [2, 21] or scenario-based [7, 19, 17] optimization methods, which are both challenging to scale to long optimization horizons due to their open-loop nature. Dynamic programming formulations [4], while applicable to stochastic systems, typically involve nonconvex optimization problems and do not support specifying the terminal distribution. Polynomial optimization has been proposed for deterministic nonlinear [11] and hybrid systems [16]. We extend the method to stochastic systems using a weak formulation of the Fokker-Planck equation. As a cost function, we propose to use the Christoffel polynomial, which can be estimated from data.